Load the tidyverse package.
library(tidyverse)
data <- read_csv("data/chds6162_data.csv")
## Rows: 1236 Columns: 23
## ── Column specification ─────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): drace
## dbl (22): id, pluralty, outcome, date, gestation, sex, wt, parity, race, age, ed, ht, wt.1, dage, ded...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
With the function select we can select variables
(columns) from the larger data frame.
Use select to show just the gestation
variable.
data_ges <- data %>%
select(gestation)
We can also select a range of columns.
select all the variables that belong to the father (they
had a “d” in front of them) drace to dwt.
data %>%
select(drace:dwt)
#What about just the id column and everything after the father information?
data %>%
select(id, marital:last_col())
We can drop variables using the -var format. Drop the
marital variable.
data %>%
select(-(marital))
art by @allison_horst
We use mutate we make new variables or change existing
ones.
Create a new variable with a specific value
Create a new variable called data_decade. Imagine that
you will be merging this dataset from 61-62 to dataset from the 70’s. To
make it easier, you will create this variable with the value “60s.”
data %>%
mutate(data_decade = "60s")
Create a new variable based on other variables
Create a new variable called wt_k. This variable will
give you information about mom’s weight pre-pregnancy(wt)
in kilos (1 pound = .454 kilos).
data %>%
mutate(wt_k = wt*.454)
# too many decimals? let's round things
data %>%
mutate(wt_k = round((wt*.454),2))